Model-Driven Tile Size Selection for DOACROSS Loops on GPUs

نویسندگان

Peng Di

Jingling Xue

چکیده

DOALL loops are tiled to exploit DOALL parallelism and data locality on GPUs. In contrast, due to loop-carried dependences, DOACROSS loops must be skewed first in order to make tiling legal and exploit wavefront parallelism across the tiles and within a tile. Thus, tile size selection, which is performance-critical, becomes more complex for DOACROSS loops than DOALL loops on GPUs. This paper presents a model-driven approach to automating this process. Validation using 1D, 2D and 3D SOR solvers shows that our framework can find the tile sizes for these representative DOACROSS loops to achieve performances close to the best observed for a range of problem sizes tested.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Empirical Study on DOACROSS Loops

Loop-iteration level parallelism is one of the most common forms of parallelism being exploited by optimizing compilers and parallel machines. In this study, we selected 6 large application programs and used an execution-driven simulation technique from MaxPar 5] to identify and to measure the eeectiveness of concurrent DOACROSS loops execution. It was found that executing DOACROSS loops serial...

متن کامل

Run - Time Selection of Block Size in Pipelined

Parallelizing compiler technology has improved in recent years. One area in which compilers have made progress is in handling DOACROSS loops, where cross-processor data dependencies can inhibit eecient parallelization. In regular DOACROSS loops, where dependencies can be determined at compile time, a useful parallelization technique is pipelining, where each processor (node) performs its comput...

متن کامل

On Effective Execution of Nonuniform DOACROSS Loops

It is extremely difficult to parallelize DOACROSS loops with non-uniform loop-carried dependences. In this paper, we present a static scheduling scheme with an accompanying synchronization strategy that can execute such DOACROSS loops effectively and efficiently. Our approach uses one of the parallelization techniques called Dependence Uniformization, which finds a small set of uniform dependen...

متن کامل

Run-Time Selection of Block Size in Pipelined Parallel Programs

Parallelizing compiler technology has improved in recent years. One area in which compilers have made progress is in handling DOACROSS loops, where crossprocessor data dependencies can inhibit e cient parallelization. In regular DOACROSS loops, where dependencies can be determined at compile time, a useful parallelization technique is pipelining, where each processor (node) performs its computa...

متن کامل

Redundant Synchronization Elimination for DOACROSS Loops

Synchronizations are necessary when there are dependences between concurrent processes. However, many synchronizations are redundant because the composite effect of the other synchronizations may have already covered them. The problem of redundant synchronization elimination in DOACROSS loops is investigated and an efficient algorithm that identifies redundant synchronizations in multiply-neste...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Model-Driven Tile Size Selection for DOACROSS Loops on GPUs

نویسندگان

چکیده

منابع مشابه

An Empirical Study on DOACROSS Loops

Run - Time Selection of Block Size in Pipelined

On Effective Execution of Nonuniform DOACROSS Loops

Run-Time Selection of Block Size in Pipelined Parallel Programs

Redundant Synchronization Elimination for DOACROSS Loops

عنوان ژورنال:

اشتراک گذاری